Scripting: exercises

Can you write a series of scripts that feed into each other's output? Ideally you want to write a series of scripts that accept a stream of text as input, and output another stream of text. If you don't know ho wto read a stream of text as input in python, consider how we used the sys module to write to stderr.

Hint: you should write the scripts as text files and then run them in a notebook cell using either the ! or %%bash notation.

Hint: you coulkd reuse the classes from the OOP exercizes.

Exercise 1: biological sequences manipulation

Can you write (and chain) the following scripts:

  • read a nucleotide fasta file and output a restricted region of the sequence based on a criteria of your choice
    • For instance you could restrict by position
    • You could add options to only output one strand, or both
  • read a series of nucleotide sequences and output the all possible frame translations, in Fasta format
  • read a series of translated nucleotide sequences and filter those with a stop codon in the middle of the sequence

You can use the ../data/proteome.faa and ../data/genome.fasta as input files.


In [ ]:

Exercise 2: graph theory, manipulation and analysis

Can you write (and chain) the following scripts:

  • read a stream of text from the string website (e.g. using curl) and filter the interactions by their string score
    • you can also filter by specific sub-scores, bonus points if you can make the option generic enough
  • read the filtered string interactions and build a graph, outputting the degree of each node

You can use the E. coli string network as an example; you can download it here. You can decide to parse the whole network, or to filter for high scoring interactions (i.e. > 800).


In [ ]: